inverse model
Learning to Poke by Poking: Experiential Learning of Intuitive Physics
We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 50K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods. We also demonstrate that active data collection using the learned model further improves performance.
- North America > Canada > Alberta (0.14)
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Virginia (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Robots (0.93)
- Information Technology > Artificial Intelligence > Vision (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
cf708fc1decf0337aded484f8f4519ae-Supplemental.pdf
We found that training an inverse model is crucial for learning good representations. On the first row,alevel from each environment that one-shot PPGS fails tosolve(thewhitearrowsrepresent thepolicy). Iterative Model Improvement In general settings, collecting training trajectories by sampling actions uniformly atrandom does not grant sufficient coverage ofthe state space. GLAMORGLAMOR [34] learns inverse dynamics to achieve visual goals in Atari games. The only difference withPPGS in terms of settings is that we allowGLAMORto collect data on-policy and for more interactions (2M).
A Forward DNI
In this paper we have focused on the backward (or feedback) DNI, but there is another interesting paradigm between two neural networks dubbed "forward" DNI. Here we describe this variant of the model and below its link to the cerebellum. The difference to backward DNI is that now the synthesiser predicts forward activity, not backward. Though more nuanced, the goal of forward DNI as presented in [5] is also to hasten learning. As an example, suppose we have a feedforward network as the main model and equip a backward synthesiser (one which predicts same layer error gradients) at each layer as well as a forward synthesiser which projects from the original network input x onto each layer.
- Europe > Austria (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (3 more...)